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ABSTRACT 

Recently, there has heen significant interest in the study of the com¬ 
munity search problem in social and information networks; given 
one or more query nodes, find densely connected communities con¬ 
taining the query nodes. However, most existing studies do not 
address the “free rider” issue, that is, nodes far away from query 
nodes and irrelevant to them are included in the detected commu¬ 
nity. Some state-of-the-art models have attempted to address this 
issue, hut not only are their formulated problems NP-hard, they 
do not admit any approximations without restrictive assumptions, 
which may not always hold in practice. 

In this paper, given an undirected graph G and a set of query 
nodes Q, we study community search using the fc-truss based com¬ 
munity model. We formulate our problem of finding a closest truss 
community (CTC), as finding a connected fc-truss subgraph with the 
largest k that contains Q, and has the minimum diameter among 
such subgraphs. We prove this problem is NP-hard. Furthermore, 
it is NP-hard to approximate the problem within a factor (2 —e), for 
any e > 0. However, we develop a greedy algorithmic framework, 
which first finds a CTC containing Q, and then iteratively removes 
the furthest nodes from Q, from the graph. The method achieves 2- 
approximation to the optimal solution. To further improve the effi¬ 
ciency, we make use of a compact truss index and develop efficient 
algorithms for fc-truss identification and maintenance as nodes get 
eliminated. In addition, using bulk deletion optimization and local 
exploration strategies, we propose two more efficient algorithms. 
One of them trades some approximation quality for efficiency while 
the other is a very efficient heuristic. Extensive experiments on 6 
real-world networks show the effectiveness and efficiency of our 
community model and search algorithms. 

1. INTRODUCTION 

Community structures naturally exist in many real-world net¬ 
works such as social, biological, collaboration, and communication 
networks. The task of community detection is to identify all com¬ 
munities in a network, which is a fundamental and well-studied 
problem in the literature. Recently, several papers have studied a 
related but different problem called community search, which is to 
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Figure 1: Closest truss community example 


find the community containing a given set of query nodes. The 
need for community search naturally arises in many real applica¬ 
tion scenarios, where one is motivated by the discovery of the com¬ 
munities in which given query nodes participate. Since the com¬ 
munities defined by different nodes in a network may be quite dif¬ 
ferent, community search with query nodes opens up the prospects 
of user-centered and personalized search, with the potential of the 
answers being more meaningful to a user El. As just one ex¬ 
ample, in a social network, the community formed by a person’s 
high school classmates can be significantly different from the com¬ 
munity formed by her family members which in turn can be quite 
different from the one formed by her colleagues (m. 

Various community models have been proposed based on differ¬ 
ent dense subgraph structures such as fc-core Eiunmoi , fc-truss 
03, quasi-clique weighted densest subgraph 1^ . to name a 
few major examples. Of these, the fc-truss as a definition of cohe¬ 
sive subgraph of a graph G, requires that each edge be contained 
in at least (fc — 2) triangles within this subgraph. Consider the 
graph G in Figure [T] in the subgraph in the whole grey region 
(i.e., excluding the node t), each edge is contained in two trian¬ 
gles. Thus, the subgraph is a 4-truss. It is well known that most of 
real-world social networks are triangle-based, which always have 
high local clustering coefficient. Triangles are known as the fun¬ 
damental building blocks of networks I29l . In a social network, a 
triangle indicates two friends have a common friend, which shows 
a strong and stable relationship among three friends. Intuitively, 
the more common friends two people have, the stronger their rela¬ 
tionship. In a fc-truss, each pair of friends is “endorsed” by at least 
(fc — 2) common friends. Thus, a fc-truss with a large value of k sig¬ 
nifies strong inner-connections between members of the subgraph. 
Huang et al. El proposed a community model based on the notion 
of fe-truss as follows. Given one query node q and a parameter k, a 
fc-truss community containing g is a maximal fc-truss containing q, 
in which each edge is “triangle connected” with other edges. Trian¬ 
gle connectivity is strictly stronger than connectivity. The fc-truss 
community model works well to find all overlapping communities 
containing a query node q. It is natural to search for communi¬ 
ties containing a set of query nodes in real applications, and the 
above community model, extended for multiple query nodes, has 
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the following limitations. Due to the strict requirement of triangle 
connectivity constraint, the model may fail to discover any com¬ 
munity for query nodes. For example, for the graph of Figure[TJa), 
and query nodes Q — {t; 4 ,q' 3 ,pi} the above fc-truss community 
model cannot find a qualified community for any k, since the edges 
(^ 4 ,®) and (®,pi) are not triangle connected in any fc-truss. A 
detailed comparison of various community search models and tech¬ 
niques can be found in the Section|7] 

In this paper, we study the problem of close community search, 
i.e., given a set of query nodes, find a dense connected subgraph 
that contains the query nodes, in which nodes are close to each 
other. As a qualifying cohesive structure, we use the notion of k- 
truss for modeling a densely connected community, which inher¬ 
its several good structural properties, such as fc-edge connectivity, 
bounded diameter and hierarchical structure. In addition, to ensure 
every node included in the community is tightly related to query 
nodes and other nodes included in the community reported, we use 
graph diameter to measure the closeness of all nodes in the commu¬ 
nity. Thus, based on fe-truss and graph diameter, we propose a novel 
community model as closest truss community (CTC), which re¬ 
quires that the all query nodes are connected in this community, the 
graph structure is a fe-truss with the largest trussness fc. In general, 
several such candidate communities may exist. Some of them may 
suffer from the so-called “free rider effect” formally defined and 
studied in 03. While we discuss this in detail in Section[T2] we il¬ 
lustrate it with an example here. In FigurefTta), for the query nodes 
{qi,®,®}, the subgraph shaded grey is a 4-truss containing the 
query nodes. It includes the nodes pi,p 2 ,P 3 which are intuitively 
not relevant to the query nodes. Specifically, they are all far away 
from qi and can be regarded as “free riders”. This 4-truss is said to 
suffer from the free rider effect. On the other hand, the subgraph 
without the nodes {pi,P 2 ,P 3 } is also a 4-truss, it has the smallest 
diameter among all 4-trusses containing the query nodes, and does 
not suffer from the free rider effect. Motivated by this, we define 
a closest truss community as a connected fe-truss with the largest fc 
containing the query nodes and having the smallest diameter. We 
show that such a definition avoids the free rider effect. A con¬ 
nected fe-truss with the largest fc containing given query nodes can 
be found in polynomial time. However, as we show, finding such 
a fe-truss with the minimum diameter is NP-hard and it is hard to 
approximate within a factor better than 2. Here, the approximation 
is w.r.t. the minimum diameter. On the other hand, we develop a 
greedy strategy for finding a CTC that delivers a 2-approximation 
to the optimal solution, thus essentially matching the lower bound. 
In order to make our algorithm scalable to large real networks, we 
propose two techniques. One of them is based on bulk deletion 
of nodes far away from query nodes. The second is a heuristic 
exploration of the local neighborhood of a Steiner tree containing 
the query nodes. The challenge here is that a naive application of 
Steiner trees may yield a fe-truss with a low value of fc, which is 
undesirable. We address this challenge by developing a new notion 
of distances based on edge trussness. Specifically, we make the 
following contributions in this paper. 

• We propose a novel community search model called closest 
truss community (CTC) and motivate the problem of finding 
CTC containing given query nodes (Section[3. 

• We analyze the structural and computational properties of 
CTC and show that it avoids the free rider effect, is NP- 
hard to compute exactly or to approximate within a factor 
of (2 — e), for any e > 0 (Section[3. 

• We develop a greedy 2-approximation algorithm for finding 
a CTC given a set of query nodes. The algorithm is based on 


Table 1: Frequently Used Notations 


Notation 

Description 

G = {y(G),E(G)) 

An undirected and connected simple graph G 

n; m 

The number of vertices/edges in G 

TTfu) 

The set of neighbors of v 

supjj(e) 

The support of edge e in H 


Trussness of graph H 

rffi) 

Trussness of edge e 

t(v) 

Trussness of vertex v 

rls) 

The maximum trussness of connected graphs containing S 

diam(i?) 

The diameter of graph H 


The shortest distance between v and u in H 

di5tK (fl. Q) 

distfflfl, Q) = max„£R,u£Q distH(v, u) 


finding, in linear time, a connected fe-truss with maximum fc 
containing the query nodes, using a simple truss index. Then 
successively nodes far away from the query nodes are elimi¬ 
nated (Section|4l(. 

• We further speed up CTC search in two ways: (1) we make 
use of a clever bulk deletion strategy and (2) find a Steiner 
tree of the query nodes and expand it into a fe-truss by ex¬ 
ploring the local neighborhood of the Steiner tree. The first 
of these slightly degrades the approximation factor while the 
second is a heuristic (Section[5j. 

• We extensively experiment with the various algorithms on 
6 real networks. Our results show that our closest truss 
community model can efficiently and effectively discover 
the queried communities on real-world networks with ground- 
truth communities. (Section|6l(. 

In Section|3 we present a detailed comparison with related work. 
In Section ITT] we discuss alternative candidates for community 
models and provide a rationale for our design decisions. We sum¬ 
marize the paper in Section[8l 

2. PROBLEM DEFINITION 

We consider an undirected, unweighted simple graph G = {V(G), 
E{G)) with n = |U(G)| vertices and m = |i?(G)| edges. We de¬ 
note the set of neighbors of a vertex v by N{v), i.e., N{v) = {u £ 
V : {v,u) G E}, and the degree of v by d{v) = |W(t;)|. We 
use dmax = max„gy d(v) to denote the maximum vertex degree 
in G. W.l.o.g we assume in this paper that the graph G we con¬ 
sider is connected. Note that this implies that m > n — 1. Tabled 
summarizes the frequently used notations in the paper. 

A triangle in G is a cycle of length 3. Let u,v,w € U be the 
three vertices on the cycle, then we denote this triangle by Auvw ■ 
The support of an edge e{u,v) G E \n G, denoted supcifi), is 
defined as |{A„„u, : w G U}|. When the context is obvious, we 
drop the subscript and denote the support as sup(e). Based on 
the definition of fe-truss 13129), we define a connected fe-truss as 
follows. 

Definition 1 (Connected K-Truss). Given a graph G 
and an integer k, a connected k-truss is a connected subgraph 
H G G, such that\/e G E{H), supH{e) > (fc — 2). 

Intuitively, a connected fe-truss is a connected subgraph such that 
each edge {u, v) in the subgraph is “endorsed” by fc — 2 common 
neighbors of u and i> (2l- In a connected fe-truss graph, each node 
has degree at least fe — 1 and a connected fe-truss is also a (fe — 1)- 
core (3- Next, we define the trussness of a subgraph, an edge, and 
a vertex as follows. 

Definition 2 (Trussness). The trussness of a subgraph H 
C G is the minimum support of an edge in H plus 2, i.e., t{H) — 
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2 + mmg^E(H){supH{e.)}. The trussness of an edge e € E{G) 
is r(e) = max^cGAeeB(ir){''"(^^)}- The trussness of a vertex 
V eV{G) is t{v) = max^cGA«ev(Jf) 

Consider the graph G in Figure[TJa). Edge e{q 2 ,V 2 ) is contained 
in three triangles ^q 2 ii 2 vi and '^ 92 « 2 »' 5 > thus its support 

is supG{e{q 2 ,V 2 )) = 3. Suppose H is the triangle then 

the trussness of the subgraph H is t{H) = 2+mineg_ff supH{e) = 

3, since each edge is contained in one triangle in H. The trussness 
of the edge e{q 2 ,V 2 ) is 4, because in the induced subgraph on ver¬ 
tices {gi, 52 , ni, W 2 }, each edge is contained in two triangles in the 
subgraph and any subgraph H containing 6 ( 52 , ^ 2 ) has t{H) < 4, 
i.e., T{e{q 2 ,V 2 )) = majiHCGAeeE(H) = 4. Note that the 

trussness of an edge e of a graph G could be less than supG (e) + 2, 
e.g., T{e{q 2 ,V 2 )) = 4 < 5 = sup{e{q 2 ,V 2 )) + 2. Moreover, the 
vertex trussness of 52 is also 4, i.e. r(q 2 ) = 4. 

For a set of vertices S C 1/(G), we use f{S) to denote the 
maximum trussness of a conncted subgraph H containing S, i.e., 
f{S) = maxscffCGAffis connected {r{H)}. Notice that by defini¬ 
tion, for S' = 0, r(0) is the maximum trussness of any edge in G. 

In Figure [Ha), the whole subgraph in the grey region is a 4-truss. 
There exists no 5-truss in G, and r(0) = 4. We will make use of 
f (0) in Section[5l 

For two nodes u,v G G, we denote by distG(u, v) the length of 
the shortest path between u and v in G, where distG(u, v) = -foo 
if u and v are not connected. We make use of the notions of graph 
query distance and diamater in the rest of the paper. 

Definition 3 (Query Distance). Given a graph G and a 
set of query nodes Q C V, for each vertex v £ G, the vertex query 
distance of v is the maximum length of a shortest path from v to a 
query node q G Q, i.e., distG(«,Q) = max^gg distG(ti, 5 ). For 
a subgraph H G G, the graph query distance of H is defined as 

distG(fT, Q) = maxugrr distG(M, Q) = max^gir^^gg distG(M, q). 

Definition 4 (Graph Diameter). The diameter of a graph 
G is defined as the maximum length of a shortest path in G, i.e., 
diam(G) = max„,„gG{distG(w,«)}. 

For the graph G in Figure [TJa) and Q = { 52 , 53 }, the vertex 
query distance of V 2 is distG(u 2 , Q) = maxqgg {distG{v 2 , ?)} 

= 2, since distG(t' 2 , 53 ) = 2 and distG(« 2 , 52 ) = 1. Let H be the 
subgraph of Figure [TJa) shaded in grey. Then query distance of H 
is distgJfT, Q) = 3. The diameter of H is diam(j'T) = 4. 

On the basis of the definitions of fc-truss and graph diameter, we 
define the closest truss community in a graph G as follows. 

Definitions (ClosestTruss Community). Givenagraph 
G and a set of query nodes Q, G' is a closest truss community 
( CTC), ifG' satisfies the following two conditions: 

(1) Connected fc-Truss. G' is a connected k-truss containing Q 

with the largest k, i.e., Q G G' G G and Ve G E{G'), 
sup{e) > k — 2; 

(2) Smallest Diameter. G' is a subgraph of smallest diameter sat¬ 

isfying condition (1). That is, ^G" G G, such that dia'm{G") 

< diam(G'), and G” satisfies condition (1). 

Condition (1) requires that the closest community containing the 
query nodes Q be densely connected. In addition. Condition (2) 
makes sure that each node is as close as possible to every other 
node in the community, including the query nodes. We next illus¬ 
trate the notion of CTC as well as the consequence of considering 
Conditions (1) and (2) in different order. 


Example 1. In Definition^ we firstly consider the connected 
fe-truss of G containing query nodes with the largest trussness, and 
then among such subgraphs, regard the one with the smallest diam¬ 
eter as the closest truss community. Consider the graph G in Eig- 
ureJTJa), and Q = {51,52,53}; the subgraph in the region shaded 
grey is a 4-truss containing Q, and is a subgraph with the largest 
trussness that contains Q, and has diameter 4. Notice that in Eig- 
ure (Ua), although the nodes pi,p 2 ,P 3 belong to the 4-truss and 
are strongly connected with 53, they are far away from the query 
node qi. Figure [TJb) shows another 4-truss containing Q but not 
Pi, P2, P3, and its diameter is 3. It can be verified that this is the 4- 
truss with the smallest diameter. Thus, by Condition (2) of Defini- 
tionjSj the 4-truss graph in FigureJTJa) will not be regarded the clos¬ 
est truss community, whereas the one in Figure [TJb) is indeed the 
CTC. Intuitively, the nodes pi,P2,P3 are “free riders” that belong 
to a community defined only using Condition (1), and are avoided 
by Condition (2). We will see in Section [3^ that the definition of 
CTC above avoids the so-called “free rider effect”. □ 

Example 2. Suppose we apply the conditions in Definition[3 
in the opposite order. That is, we first minimize the diameter among 
connected subgraphs of G containing Q and look for the fc-truss 
subgraph with the largest k among those. Firstly, we find that the 
cycle of {{qi,t), (1,53), (53,114), (114,52), (52,51)} is the con¬ 
nected subgraph containing Q with the smallest diameter 2. Then, 
we find that this cycle is also the fc-truss subgraph with the largest 
k containing itself. However, it is only a 2-truss, which has a 
loosely connected structure compared to Figure [TJb). This justi¬ 
fies the choice of the order in which Conditions (1) and (2) should 
be applied. □ 

We discuss several natural candidates for community models in 
Section [tT] and provide a rationale for our design decisions. We 
have a choice between minimizing diameter or minimizing query 
distance. We address this choice in Section (3)^ Example [3| illus¬ 
trates the value added by minimizing the diameter over minimizing 
just the query distance. The problem of closest truss community 
(CTC) search studied in this paper is stated as follows. 

Problem 1 (CTC-Problem). Given a graph G{V, E) and a 
set of query vertices Q = {wi, ..., Wr} C V, find a closest truss 
community containing Q. 

3. PROBLEM ANALYSIS 
3.1 Structural Properties 

Since our closest truss community model is based on the con¬ 
cept of fe-truss, the communities caputure good structral properties 
of fe-truss, such as k-edge-connected and hierarchical structure. In 
addition, since CTC is required to have minimum diameter, it also 
has bounded diameter. As a result, CTC avoids the “free rider ef¬ 
fect” I27II32I (see Section lT^ . 

Small diameter, k-edge-connected, hierarchical structure. First, 
the diameter of a connected fe-truss with n vertices is no more 
than Q- The diameter of a community is considered as 

an important feature of a community Ifl 2 l . Moreover, a fe-truss 
community is (fc — l)-edge-connected (TJ, as it remains connected 
whenever fewer than fc — 1 edges are removed 03. In addition, 
fe-truss based community has hierarchical structure that represents 
the cores of a community at different levels of granularity 03, that 
is, fe-truss is always contained in the (fe — l)-truss for any fe > 3. 

Largest fc. We have a trivial upper bound on the maximum possi¬ 
ble trussness of a connected fe-truss containing the query nodes. 
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Lemma 1. For a connected k-truss H satisfying definition of 
CTCfor Q, we have k < min {r(gi), r{qr)} holds. 

Proof. First, we have Q ^ H. For each node q £ Q, q cannot 
be contained in a fc-truss in G, whenever k > T{q). Thus, the 
fact that H is a. fc-truss subgraph containing Q implies that k < 
min{r(gi), □ 

Lower and upper bounds on diameter. Since the distance func¬ 
tion satisfies the triangular inequality, i.e., for all nodes u, v, w, 
distG(u,t;) < distG(u, tn)+distG(u;, w), we can express the lower 
and upper bounds on the graph diameter in terms of the query dis¬ 
tance as follows. 

Lemma 2. For a graph G{V, E) and a set of nodes Q £ G, 
we have distG(G', Q) < diam(G) < 2distG(G, Q). 

Proof. First, the diameter diam(G) = max„,ugG distG(n, u), 
which is clearly no less than than distG(G, Q) = tnaxv^G.qcQ 
distG(n, 5 ) for <5 L G. Thus, distG(G, Q) < diam(G). Second, 
suppose that the longest shortest path in G is between v and u. 
Then Vg € Q, then we have diam(G) = dist(u, u) < dist(u, q) + 
dist(g, u) < 2distG(G, Q). The lemma follows. □ 

3.2 Free Rider Effect 

In previous work on community detection, researchers 12711321 
have identified an undesirable phenomenon called “free rider ef¬ 
fect”. Intuitively, if a definition of community admits irrelevant 
subgraphs in the detected community, we refer to such irrelevant 
subgraphs as free riders. For instance, suppose we use the classic 
density definition of average internal degree as the community 
goodness metric. Then for a set of query nodes Q, the commu¬ 
nity is a subgraph containing Q with the maximum density. Then, 
any local community for Q merged with the densest subgraph part 
will increase the community density. However, the densest sub¬ 
graph may be disconnected from or irrelvant to query nodes. This 
shows the simple density metric suffers from the free rider effect. 
Wu et al. 1321 show that serveral other goodness metrics including 
minimum degree, local modularity, and external conductance suf¬ 
fer from the free rider effect. Following Wu et al. on, we define 
the free rider effect as follows. Typically, a community definition 
is based on a raodness metric f{H) for a subgraph H\ subgraphs 
with minimurr[j f{H) value are defined as communities. E.g., for 
our CTC problem, diameter is the goodness metric: among all sub¬ 
graphs with maximum trussness, the smaller the diameter of H, 
the better it is as a community. The definition of free rider effect is 
based on this goodness metric. We term a community query inde¬ 
pendent if it is the solution to the community search with Q set to 
0 . 

Definition 6 (FRE). Given a non-empty query Q, let FI be 
a solution to a community definition based on a goodness metric 
/(.). Let H* be a (global or local) optimum solution, which is 
query-independent. If f{H U H*) < f{H), we say that the defini¬ 
tion suffers from free rider effect. Here, nodes in H* \ H are called 
free riders/or the query Q and community H. 

Example 3. Consider Figure|2 showing a graph G and query 
nodes Q = {gi, g 2 }. It also shows subgraphs Gi and G 2 . All three 
graphs - G, Gi, and G 2 - are 4-trusses containing Q. The query 
distance of the star node r is 3, while that for all other nodes is at 
most 2. Thus, the query distance of G is 3. The subgraph Gi has 

*We use minimum w.l.o.g. 



Figure 2: A graph G with Q — {gi, g 2 }. 

the minimum query distance 2 among all 4-trusses containing Q. 
However, its diameter is 3, as the distance between square node v 
and circle node p is 3. On the other hand, the subgraph G 2 , while 
having the same query distance as Gi, has a strictly smaller diam¬ 
eter 2. It has the minimum diameter among all 4-trusses containing 
Q. 

Both the star node and the square nodes are free riders. The star 
node is the furthest from query node g 2 and its removal from G 
leaves the trussness unchanged. The square nodes have the same 
query distance 2 as the circle node p. However, the square nodes 
are not close enough to other nodes of the community: e.g., their 
distance to circle node p is 3. Unlike the circle nodes, removal 
of square nodes leaves the trussness unchanged. Thus, the square 
nodes are also free riders, while the circle nodes aren’t. Minimiz¬ 
ing query distance among 4-trusses eliminates the free rider star 
node but not the square free rider nodes, while minimizing diame¬ 
ter eliminates both free riders. □ 

We next show that our definition of CTC avoids the problem of 
free rider effect. 

In general, there may be multiple CTCs H, i.e., connected k- 
trusses with maximum trussness containing Q with the minimum 
diameter. For example, consider the graph G in Figure [T] and Q — 
{gs}. The subgraphs of G induced respectively by {g 3 ,pi,P 2 ,P 3 } 
and {g 3 , V 3 , Vi, 115 } are both 4-trusses with diameter 1. Both hap¬ 
pen to be maximal in that they are not contained in any other 4-truss 
with this property. 

Proposition 1. For any graph G and query nodes Q (ZV{G), 
there is a solution H to the CTC search problem such that for 
all query-independent optimal solutions H*, either H* — H, or 
H U H* is disconnected, or H U H* has a strictly larger diameter 
than H. 

Proof. Let C(G, Q) denote the set of optimal solutions to the 
CTC search problem on graph G and query nodes Q. C{G, Q) is 
partially ordered w.r.t. the graph containment order C. Let H be 
any maximal element of C(G, Q), let H* be any query-independent 
optimal solution, and consider H U H*. Assume w.l.o.g. that 
(H* \ H) 7 ^ 0. Suppose that H U H* is a connected fc-truss 
with maximum trussness containing Q, and diam(iT U H*) < 
diam(77). This contradicts the maximality of H. □ 

3.3 Hardness and Approximation 

Hardness. In the following, we show the CTC-Problem is NP- 
hard. Thereto, we define the decision version of the CTC-Problem. 

Problem 2 (CTCk-Problem). Given a graph G{V,E), a set 
of query nodes Q — {wi, ...,Vr} C V and parameters k and d, test 
whether G contains a connected k-truss subgraph with diameter at 
most d, that contains Q. 

Theorem 1. The CTCk-Proh\em is NP - hard . 

Proof. We reduce the well-known NP-hard problem of Maxi¬ 
mum Clique (decision version) to CTCk-Problem. Given a graph 
G{V, E) and number k, the Maximum Clique Decision problem is 
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to check whether G contains a clique of size k. From this, construct 
an instance of CTCk-Problem, consisting of graph G, parameters 
k and d = 1, and the empty set of query nodes Q = 0. We show 
that the instance of the Maximum Clique Decision problem is a 
YES-instance iff the corresponding instance of CTCk-Problem is 
a YES-instance. Clearly, any clique with at least k nodes is a con¬ 
nected fc-truss with diameter 1. On the other hand, given a solution 
H for CTCk-Problem, H must contain at least k nodes since id is a 
fc-truss, and diam{H) — d — 1, which implies id is a clique. □ 

The hardness of CTC-Problem follows from this. The next nat¬ 
ural question is whether CTC-Problem can be approximated. 

Approximation. Eor a > 1, we say that an algorithm achieves 
an a-approximation to the closest truss community (CTC) search 
problem if it outputs a connected fc-truss subgraph Ff C G such 
that <5 C FF, r(FF) = t{H*) and diam(FF) < a ■ diam(FF*), 
where FF* is the optimal CTC. That is, FF* is a connected fc-truss 
with the largest fc s.t. Q C FF*, and diam{H*) is the minimum 
among all such CTCs containing Q. Notice that the trussness of 
the output subgraph FF matches that of the optimal solution FF* 
and that the approximation is only w.r.t. the diameter: the diameter 
of FF is required to be no more than a ■ diam{H*). 

Non-Approximability. We next prove that CTC-Problem cannot 
be approximated within a factor better than 2. We establish this re¬ 
sult through a reduction, again from the Maximum Clique Decision 
problem to the problem of approximating CTC-Problem, given fc. 
In the next section, we develop a 2-approximation algorithm for 
CTC-Problem, thus essentially matching this lower bound. Notice 
that the CTC-Problem with given parameter fc is essentially the 
CTCk-Problem. 

Theorem 2. Unless P = NP, for any e > 0, r/ie CTC-Problem 
with given parameter fc cannot be approximated in polynomial time 
within a factor (2 — e) of the optimal. 

Proof. Suppose there exists a polynomial time algorithm A for 
the CTC-Problem with a given fc that provides a solution FF with 
an approximation factor (2 — e) of the optimal solution FF*. Set 
the query nodes <5 = 0. By our assumption, we have Q C FF, 
r(FF) = t{H*) = fc and diam(FF) < (2 — e) • diam(FF*). Next, 
we use this approximation solution to exactly solve the Maximum 
Clique Decision problem as follows. Since the latter cannot be 
done in polynomial time unless P = NP, the theorem follows. 

Run algorithm A on a given instance G of the Maximum Clique 
Decision problem, with parameter fc and query nodes <3 = 0. We 
claim that G contains a clique of size fc iff A outputs a solution 
FF with r(FF) = fc and diam{H) = 1. To see this, suppose 
diam{H) — 1, then the optimal solution FF* has diam(FF*) < 
diam(FF) = 1, and FF* is a connected fc-truss, which shows FF* is 
a clique of size fc in G. On the other hand, suppose diam{H) > 2. 
Then we have 2-diam(FF*) > (2—e)-diam(FF*) > diam(FF) > 2. 
Since diameter is an integer, we deduce that diam{H*) > 2. In 
this case, G cannot possibly contain a clique of size fc, for if it did, 
that clique would be the optimal solution to the CTC-Problem on 
G, with parameter fc, whose diameter is 1, which contradicts the 
optimality of FF*. Thus, using algorithm A, we can distinguish be¬ 
tween the YES and NO instances of the Maximum Clique Decision 
problem. This was to be shown. 

□ 


Algorithm 1 Basic (G, Q) 

Input: A graph G = {V, E), a set of query nodes Q = {qi, ...,qr}. 
Output: A connected fc-truss R with a small diameter. 

1: Find a maximal connected fc-truss containing Q with the lai'gest fc as 

Go//see Algorithm 
2 : ; 0 ; 

3: while connect^^ (Q) = true do 
4: Compute dist^^ (g, w), Vg € Q and \/u € Gf, 

5: u* ^ argmax„gG, distG'j(u, Q); 

6: distG, (Gi,Q) distG, (u*,Q); 

7: Delete u* and its incident edges from Gi; 

8: Maintain fc-truss property of Gi //see Algorithm [3l 

9: G/+1 ^G/G^/ + l; 

10: R <- argminG/g{Gg_,,,_( 3 j_^} distg;/(G', Q); 


a 2-approximation to the optimal result. Finally, we discuss proce¬ 
dures for an efficient implementation of the algorithm and analyze 
its time and space complexity. 

4.1 Basic Algorithmic Framework 

Here is an overview of our algorithm Basic. First, given a graph 
G and query nodes Q, we find a maximal connected fc-truss, de¬ 
noted as Go, containing Q and having the largest trussness. As Go 
may have a large diameter, we iteratively remove nodes far away 
from the query nodes, while maintaining the trussness of the re¬ 
mainder graph at fc. 

Algorithm. Algorithm[T]outlines a framework for finding a closest 
truss community based on a greedy strategy. For query nodes Q, we 
first find a maximal connected fc-truss Go that contains Q, s.t. fc = 
r(Go) is the largest (line 1). Then, we set I — 0. For all u £ Gi 
and q G Q,we compute the shortest distance between u and q (line 
4), and obtain the vertex query distance distcj (u, Q). Among all 
vertices, we pick up a vertex u* with the maximum distcj {u*, Q), 
which is also the graph query distance distc, {Gi, Q) (lines 5-6). 
Next, we remove the vertex u* and its incident edges from Gi, 
and delete any nodes and edges needed to restore the fc-truss prop¬ 
erty of Gi (lines 7-8). We assign the updated graph as a new Gi. 
Then, we repeat the above steps until Gi does not have a connected 
subgraph containing Q (lines 3-9). Finally, we terminate by out- 
putting graph R as the closest truss community, where R is any 
graph G' G {Go,..., Gi_i} with the smallest graph query dis¬ 
tance distfj/(G^ Q) (line 10). Note that each intermediate graph 
G' € (Go,..., Gi_i} is a fc-truss with the maximum trussness as 
required. 

Example 4. We apply Algorithm\I\on G in Figure\7\for Q = 
{qi, q 2 , qi}. First, we obtain the 4-truss subgraph Go shaded in 
grey, using a procedure we will shortly explain. Then, we compute 
all shortest distances, and get the maximum vertex query distance 
as dist^Q (pi, Q) = 4, and u* = pi. We delete node pi and its in¬ 
cident edges from Go; we also delete p 2 and pa, in order to restore 
the 4-truss property. The resulting subgraph is Gi. Any further 
deletion of a node in the next iteration of the while loop will in¬ 
duce a series of deletions in line 8, eventually making the graph 
disconnected or containing just a part of query nodes. Ai a re¬ 
sult, the output graph R, shown in Figure mb), is just Gi. Also 
6 \stu{R, Q) = 3, and R happens to be the exact CTC with diame¬ 
ter 3, which is optimal. 


4. ALGORITHMS 4.2 Approximation Analysis 

In this section, we present a greedy algorithm called Basic for the Algorithm [T] can achieve 2-approxiamtion to the optimal solu- 

CTC search problem. Then, we show that this algorithm achieves tion, that is, the obtained connected fc-truss community R satisfies 
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Figure 3: Closest truss community example 


H* is a connected k-truss containing Q, the restoration step (line 
8 ) must find a subgraph Gi+i s.t. H* C Gi+i, and Gi+i is a con¬ 
nected fc-truss containing Q. Thus, the algorithm will not terminate 
in iteration (i + 1). □ 

We are ready to establish the main result of this section. Our 
polynoimal algorithm can find a connected fc-truss community R 
having the minimum query distance to Q, which is optimal. 


Q G R, t{R) — t{H*) and diam(i?) < 2diam(iT*), for any 
optimal solution H*. Since any graph in {Go,..., Gi_i} is a con- 
nceted A:-truss with the largest k containing Q by Algorithm[T] and 
R £ {Go,..., Gi_i}, we have Q G R, and t{R) = In the 

following, we will prove that diam(i?) < 2diam(F/*). We start 
with a few key results. For graphs Gi, G 2 , we write Gi C G 2 to 
mean V{Gi) G V{G 2 ) and E{Gi) G E(G 2 ). 

Fact 1. Given two graphs Gi and G 2 with Gi C G 2 , for 
u,v G V{Gi), distG 2 (w, v) < distci (m, w) holds. Moreover, if 
Q C V{Gi), then A\sIg 2 {Gi,Q) < distci(Gi, Q) also holds. 

Proof. Trivially follows from the fact that G 2 preserves paths 
between nodes in Gi. □ 

Recall that in Algorithm [T] in each iteration i, a node u* with 
maximum dist(u*, Q) is deleted from Gi, but distc^ (Gi, Q) is not 
monotone nonincreasing during the process, hence distG;_i (Gi_i, Q) 
is not necessarily the minimum. Note that in Algorithm[T] Gi is not 
the last feasible graph (i.e., connected fc-truss containing Q), but 
Gi-i is. The observation is shown in the following lemma. 

Lemma 3. In Algorithm\J\ it is possible that for some 0 < i < 
j < I, we have Gj G Gi, and distG^(Gi,Q) < distg^. (Gj, Q) 
hold. 

Proof. It is easy to be realized, because for a vertex v G G, 
disto {v, Q) is non-decreasing monotone w.r.t. subgraphs of G. More 
precisely, foru G GiflGj, distg. {v, Q) < distg^ (v, Q) holds. □ 

Example 5. To illustrate the lemma, suppose the graph in Fig¬ 
ure |3a) is Go, a connected 4-truss containing the query nodes 
Q = {qi} in some initial graph G (not shown) and suppose the 
maximum trussness of such a subgraph is 4. One of furthest nodes 
from Q in Go is is, which has query distance distGo(f 3 ,Q) = 

2. After deleting the node fa from Go, we remove the all inci¬ 
dent edges of nodes fi, t 2 and fa, since the 4-truss subgraph in¬ 
duced by {qi, qa, fi, fa, fa} in the dashed region does not exist 
any more in Gi in Figure [3] Thus, we have the largest query dis¬ 
tance as distcj (Gi, Q) = distoj (qa, Q) = 3, which is larger than 
distG(,(Go,Q) = 2. □ 

We have an important observation that if an intermediate graph 
Gi obtained by Algorithm[T]contains an optimal solution H*, i.e., 

H* C Giand distGj (Gi, Q) > distG^ (Tf*, Q), then algorithm 
will not terminate at Gi+i. 

Lemma 4. In Algorithm\2\ for any intermediate graph Gi, we 
have H* C Gi, and distGi(Gi,Q) > distG; (JT*, Q), then Gi+i 
is a connected k-truss containing Q and H* C Gi+i. 

Proof. Suppose iT* C Gi and distGj (Gi, Q) > distG; (Ff *, Q). 
Then there exists anode u G Gi\H* s.t. distc^ {u,Q) = distc^ {Gi,Q) 
> distGi(FF*, Q). Clearly, u ^ Q. In the next iteration. Algo¬ 
rithm [T] will delete u from Gi (Step 7), and perform Step 8 . The 
graph resulting from restoring the fe-truss property is Gi+i. Since 


Lemma 5. For any H is a connected k-truss with the highest 
k containing Q, dist_B(i?, Q) < dist/r(FF, Q). 

Proof. The following cases arise for Gi_i, which is the last 
feasible graph obtained by Algorithm[T] 

Case (a): FF C Gi_i. We have distGi_i (Gi_i,Q) < distGi_i 
(FF, Q); for otherwise, if distGi_i {Gi-i,Q) > distG,_i {H,Q), 
we can deduce from Lemma |4] that Gi-i is not the last feasible 
graph obtained by Algorithm[T] a contradiction. Thus, according to 
Step 10 in Algorithm[T]and distGi_i (Gi_i, Q) < distG,_i (FF, Q), 
we have disti{(F?, Q) < distG,_i (Gi_i, Q) < distc,.! (FF, Q) < 
6 \st„{H,Q). 

Case (b): FF ^ Gj-i. There exists a vertex v G H deleted from 
one of the subgraphs {Go,..., Gi_ 2 }. Suppose the first deleted 
vertex v* G H is in graph Gi, where 0 < i < 1 — 2, then 
V* must be deleted in Step 7, but not in Step 8 . This is because 
each vertex/edge of FF satisfies the condition of fe-truss, and will 
not be removed before any vertex is removed from Gi. Then, 
we have distG^(Gi, Q) = distG^ (u*, Q) = distG^ (FF, Q), and 
distGj (Gi, Q) > distij(F?, Q) by Step 10. As aresult, distij(F?, Q) 
<distG,(FF,Q)<distH(FF,Q). □ 

Based on the preceding lemmas, we have: 

Theorem 3. Algorithm \I\provides a 2-approximation to the 
CTC-Problem as diam{R) < 2diam{H*). 

Proof. Since distft{R, Q) < distrr* (FF*, Q) by Lemma[5] we 
get diam{R) < 2disti{(F?, Q) < 2dist_ff* (FF*, Q) < 2diam[H*) 
by Lemma|2 The theorem follows from this. □ 

4.3 K-truss Identification and Maintenance 

In this section, we introduce the detailed implementation of Al- 
gorithm[T] Finding Go, the maximal connected fc-truss containing 
Q with the largest trussness k, is a basic primitive in our problem. 
A straightforward method is to apply a truss decomposition algo¬ 
rithm ED, and delete edges in ascending order of edge support 
from G, until Q becomes disconnected. Then we can obtain the 
largest trussness k and recover Go by keeping all fe-truss edges. 
However, this method is quite costly. To find Go efficiently, we 
design an index structure. The index is constructed by organizing 
edges according to their trussness. 

Index Construction. We first apply a truss decomposition algo¬ 
rithm such as 1291 and compute the trussness of each edge of graph 
G. We omit the details of this algorithm due to space limitation. 

Based on the obtained edge trussness, we construct our truss in¬ 
dex as follows. For each vertex v G L, we sort its neighbors 
N{v) in descending order of the edge trussness r(e(u,u)), for 
u G N{v). For each distinct trussness value fe > 2, we mark 
the position of the first vertex u in the sorted adjacency list where 
r(e(u, u)) = fc. This supports efficient retrieval of v's incident 
edges with a certain trussness value. The vertex trussness of v is 
also kept as t(v) = max{T(u, u)|u G N{v)}, which is the truss¬ 
ness of the first edge in the sorted adjacency list. Moreover, we 
build a hashtable to keep all the edges and their trussness values. 
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Algorithm 2 FindGo(G, Q) 

Input: A graph G = (V, E), a set of query nodes Q = {qi , Qr}. 
Output: A connected fc-truss Go containing Q with the largest k. 

1: A: min {T(gi), ...,T(gr)} //see Lemma fll 
2: y(Go) ^ 0; = Q; 

3: while connected (Q) = false do 
4: for V € S/c do 

5: if ti e y(Go) then 

6: kjfiax ^— A: + 1; 

7: else 

8. kjYiax ^ +oo; V(Go) ^ V{Go) U M; 

9: for {v, u) G G with k < r{v, u) < kmax do 

10: Go ■<— Go U {(f, w)}; 

11: \iu ^ Sk then S^VJ {u}\ 

12: / ■<—max{r(u, tx)|(D, w) ^ Go}; 

13: Si SiG }; 

14: fc ■<— /c — 1; 

15: Compute the edge support sup{y, u) in Go^ for all {v, u) G Go; 


This is identical to the simple truss index of GU and we refer to it 
as the truss index. 

In the following, we will show that this truss index is sufficient to 
design an algorithm for finding the maximal connected fc-truss con¬ 
taining given query nodes Q in time 0{m'), where m! = |i?(Go)|. 
This time complexity is essentially optimal. We remark the com¬ 
plexity of this fc-truss index construction below. 

Remark 1. The construction of this truss index takes 0{p-m) 
time and 0(m) space, where p is the arboricity of graph G, i.e., 
the minimum number of spanning forests needed to cover all edges 
of G. Notice that p < mm{dm,ax, 

Finding Go. Based on the index, we present Algorithm for 
finding Go, the maximal connected fc-truss containing Q with the 
largest trussness fc. We initialize Go to be the query vertex set Q, 
and iteratively add the edges of G in the decreasing order of truss¬ 
ness, until Go gets connected. 

The initial trussness level of the edges to be included in Go is 
computed as k — min{r( 5 i), ...,T{qr)} (line 1). This is mo¬ 
tivated by the fact that, by Lemma [T] for any fc' > fc, no con¬ 
nected fc'-truss can contain Q. We use Sk to denote the set of 
nodes to be visited within level fc. We start with Sk = Q (line 
2). For a given fc, we process each node v £ Sk, and visit its neigh¬ 
bors in a BFS manner. Then, we insert those of its incident edges 
{v,u), with fc < t{v,u) < kmax into Go, where kmax is the 
maximum possible trussness of unvisited edges. This is because 
all these edge should be present in a connected fc-truss. Mean¬ 
while, if the neighbor u is not in Sk, we add u into Sk (line 11), 
since unvisited edges incident to u may have trussness no less than 
fc. After checking all edges incident to v, we add v to Si, where 
I — max{T(t;,u) | u £ N{v),t{v,u) < fc} (line 12-13). Notice 
that I is the next highest level for which a connected /-truss con¬ 
tains the node v, which can avoid scanning the neighbor set of v at 
each level. After traversing all vertices in Sk, the algorithm checks 
whether Q is connected in Gq. If yes, the algorithm terminates, 
and Go is returned; otherwise, we decrease the present level fc by 
1 (line 14), and repeat the above steps (lines 4-14). After obtain¬ 
ing Go, we compute all edge supports by counting triangles in Go, 
which is used for the fc-truss maintenance (line 15). 

The following example illustrates the algorithm. 

Example 6. Consider the graph G in Figure|4]with Q = {qi, 
52 }. The trussness of each edge is displayed, e.g., T{qi,vi) = 4. 
Now, we apply Algorithml^on G to find Go containing Q. We can 



Figure 4: An example graph G of finding Go 


Algorithms K-trussMaintenance (G, Vd) 

Input: A graph G = (y, E), a set of nodes to be removed as Vd- 
Output: A k -truss graph. 

1: S <— 0; US is the set of removed edges. 

2: for V £ Vd and {v, u) £ G do 
3: S i— S U {v,u)', 

4: for {v, u) £ S do 
5: for w £ N{v) n N(u) do 

H Update the support of edges {v, w) and (ti, w) 

6: supiv, w) <— sup(v, w) — 1; supiu, w) t— supiu, w) — 1; 

7: if supiv, w) < k — 2 and (v, w) f S then S’ <— S' U (u, w); 

8: if supiu, ui) < k — 2 and (n, to) ^ S then S <— S U (tt, ui); 

9: Remove (ti, u) from G; 

10: Remove isolated vertices from G; 


verify that r(qi) = ''"( 52 ) = 4 so we start with level fc = 4 and 
set S 4 = {gi, 52 }. Then, we process the node qi £ Si, and insert 
all its incident edges into Go, for the trussness of each edge is 4. 
Meanwhile, all its neighbors are inserted into Si. We repeat above 
process for each node in Si. Note that for nodes ti, t2, r(fi, 12 ) = 
2, so we insert ti,t2 into S2 (lines 11-12 of Algorithmic. Then, 
at level fc = 4, we get the 4-truss as the whole graph in Figure |4] 
minus the edge (ii,t 2 ), for r(fi,f 2 ) = 2. Since the current Go 
is not connected, we decrease the truss level fc to 3, and find that 
S3 = 0. Then, we decrease fc to 2, and find that S2 = {/i,i 2 }. 
So we expand from the edge incident to fi, and insert the edge 
iti,t 2 ) into Go, and find that the resulting graph contains Q and is 
connected. In this example, Go happens to coincide with G. □ 

Remark 2. Based on the truss index, for each vertex v, in line 
9 of Algorithm^ each edge (v, u) can be accessed constant time 
using the sorted adjacent list of v, and in line 12, we can compute 
I in constant time. Algorithm^takes time O(m') where m' = 

\E{Go)\- 

Computing Query Distance. For a vertex v, to compute the query 
distance distcAtt, Q), we need to perform \Q\ BFS traversals on 
graph Gi. Specifically, for each query node q £ Q, with one BFS 
traversal starting from q in Gi, we can obtain the shortest distance 
disto- [v, q) for each node v £ Gi. Then, distCj (v, Q) is the max¬ 
imum of all shortest distances disto^ (v, q), for q £ Q. 

K-truss Maintenance. Algorithm |C describes the procedure for 
maintaining G as a fc-truss after the deletion of nodes Vd from G. 
In Algorithm[T] Vd = {w*} (see line 8)0 Generally speaking, after 
removing nodes Vd and their incident edges from G, G may not 
be a fc-truss any more, or Q are disconnected. Thus, Algorithm 
[3 iteratively deletes edges having less than (fc — 2) triangles and 
nodes disconnected with Q from G, until G becomes a connected 
fc-truss containing Q. 

AlgorithmOfirstly pushes all edges incident to nodes Vd into set 
S (lines 1-3). Then, for each edge («, v) £ S, the algorithm checks 
every triangle Auvw where w £ Niu) n Niv), and decreases the 
support of edges {u, w) and (v, w) by 1; For any edge e ^ S, with 
resulting support sup(e) < fc — 2, e is added to S. After traversing 
all triangles containing (u,v), the edge (u,v) is deleted from G. 

^In Section[3 we will discuss deleting a set of nodes Vd in batch. 
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This process continues until S becomes empty (lines 4-9), and then 
the algorithm removes all isolated vertices form G (line 10 ). 

4.4 Complexity analysis 

In the implementation of Algorithm [T] we do not need to keep 
all immediate graphs, but just record the removal of vertices/edges 
at each iteration. Let Go be the maximal connected fc-truss found 
in line 1 of Algorithm[T] Let n — |I^(G'o)| and m' — |i?(Go)|, 
and let d'^^x be the maximum degree of a vertex in Go. 

At each iteration i of Algorithm [T] we delete at least one node 
and its incident edges from Gi. Clearly, the number of removed 
edges is no less than k — 1, thus the total number of iterations is 
t < mm{n' — k, m!/(k — 1)}, i.e., t is 0(min{n', m'/k}). We 
have: 

Theorem 4. Algorithm m takes 0{{\Q\t + p)m') time and 
O(m') space, where t € 0{mm{n',m'/k}), and p is the arboric- 
ity of graph Go- Furthermore, we have p < rmu{d'^ax, x/w,'}. 

Prooe. First, finding the fc-truss Go, listing all triangles of Go 
and creating a series of fc-truss graphs {Go,..., Gi_i} takes 0{p ■ 
m') time in all, where p is the arboricity of graph Go. 

Second, in each iteration, the algorithm needs to compute the 
shortest distances by a BFS traversal strating from each query node 
q £ Q, which takes 0(|Q|m') time. Since the algorithm runs in t 
iterations, the total time cost is 0(t\Q\m'). Thus, the overall time 
complexity of Algorithm[T]is 0{{\Q\t + p)m'). 

Next, we analyze the space complexity. For graphs {Go,..., G;}, 
we only record the sequence of removed edges from Go for attach¬ 
ing a corresponding label to a graph Gi at each iteration i, which 
takes 0{m') space in all. For each vertex v £ Gi, we only keep 
dist(t;,(5) instead of all query distances dist(n,q) for q £ Q, 
which takes O(n') space. Hence, the space complexity of Algo- 
rithm[T]is 0(m' + n'), which is 0{m'), as Go is connected. □ 

5. FAST SEARCH ALGORITHMS 

In this section, we focus on improving the efficiency of CTC 
search in two ways. First, we develop a new greedy strategy to 
speed up the pruning process process in Section lSTl by deleting at 
least k nodes in batch, to achieve quick termination while sacrific¬ 
ing some approximation ratio. Second, we also propose a heuristic 
strategy to quickly find the closest truss community in the local 
neighborhood of query nodes. 

5.1 Bulk Deletion Optimization 

In this subsection, we propose a new algorithm called BuIkDelete 
following the framework of Algorithm [T] which is based on dele¬ 
tion of a set of nodes in batch when maintaining a fc-truss. The 
algorithm is described in detail in Algorithm |4l which can termi¬ 
nate quicker than Algorithm [T] It is based on the following two 
observations. 

First, in Algorithm[T] if a graph Gi has query distance d istCj (Gi,Q) 
= d, only one vertex u* with distG^ («*, Q) = d is removed from 
Gi- Instead, we can delete all nodes u with distG^(w, Q) = d, 
from Gi, in one shot. The reason is that distGi('u, Q) is mono¬ 
tone non-decreasing with decreasing graphs, i.e., distcj (u, Q) > 
distGi (w, Q) = d, for j > i. Thus, removing a set of vertices 
L = {M*|distGi (w*, Q) > d,u* £ Gi} in each iteration i will 
improve the efficiency. This improvement indeed works in real ap¬ 
plications. However, in theory, it is possible that \L\ = 1 in every 
iteration. 

Our second observation, shown in the next lemma, is that a ver¬ 
tex u* with distGj {u*, Q) = d has at least k — 1 neighbors v with 


Algorithm 4 BuIkDelete (G, Q) 

Input: A graph G = {V, E), a set of query nodes Q = {qi, ...,qr}. 
Output: A connected fc-truss R with a small diameter. 

1: FindGO (G, Q)//see Algorithm \7\ 

‘2,1 d — -|-oo^ / i — 0 ^ 

3: while connect^^ (Q) = true do 
4: Compute dist^^ (g, u), '^q G Q and Vw € Gf, 

5: distG, (Gi,Q) •«- maxu.gG, distG, («*, Q); 

6: if distG, (G; ,Q) < d then 

7: d distGj (Gi, 0); 

8: L = {ri*|distG, (n*, Q) >d—l,u* £ Gi}; 

9: Maintain fc-tmss property of Gi //see Algorithm O 

10: Gi+i ^Gi;Z^Z-El; 

11: R <- argminG'6{Go.....Gi_i} distG'(G', Q); 


distGj (f, Q) = d—1. If we remove L = {u| distCj (u,Q)> d—1, 

M € Gi} at each iteration, then the resulting number of iterations is 
0(n'/fc), where n' = iy(Go)|. 

Lemma 6. Algorithm^terminates in 0{n'/k) iterations. 

Prooe. In Algorithm |4l at each iteration i, the graph Gi has 
at least one node u* with distG, (w*, Q) = d, which belongs to 
L, and will be deleted in this iteration (lines 4-10). Since Gi is 
a connected k-truss and u* £ Gi, u* has at least k — 1 neigh¬ 
bors, i.e., |A^Gi(w*)| > fc — 1. Moreover, Vn £ Na^iu*), we 
have distG, (n, <5) > d — 1: otherwise, if £ A^g, (tt*) with 
distG, (f, Q) < d — 1, we can obtain distG, {u*, Q) < d,a contra¬ 
diction. As a result, we have u* £ L and N{u) C L, and \ L\ > k. 
Thus, at least k nodes are deleted at each iteration, and the algo¬ 
rithm terminates in 0{n'/k') iterations. □ 

Thus, the number of iterations is improved from 0(min{n', m'/fc}) 
to 0{n'/k) (see Theorem|4l(. We just proved: 

Theorem 5. Algorithm\^takes 0{{\Q\t' + p')m') time using 
0{m') space, where t' £ 0(n' jk), and p' < xam{d'jnax, 

The approximation quality of Algorithm |4] is characterized be¬ 
low. 

Theorem 6. Algorithm^is a {2 + e) -approximation solution 
o/CTC-Problem, where e = 2/diam{H*). 

Prooe. To prove this theorem, we only need to ensure 6\stR{R, Q) 
< d\stH* {H*,Q) -h 1. Because diam{R) < 2dist_R(i?, Q) < 
2dist_tr* {H*, Q) + 2 < 2{diam{H*) -|- 1) by Lemma[2l then ap¬ 
proximation ratio is 2 -h £, where e = 2/diam{H*). The detailed 
proof is similar with Lemma|5] which is omitted here, due to space 
limitation. □ 

Example 7. Continuing with the previous example, we apply 
Algorithm |4] on Figure[TIa) to find the closest truss community for 
Q = {qi, 52 , 53 }. In Go, we compute d = max^gGo distGo(tr,Q) 

= 4, and L — {qi, 53 ,pi,P 2 ,P 3 }, as each node u £ L has query 
distance distG^ (m, Q) = 3 > d — 1. After removing L from Go, 
the remaining graph does not contain Q, and the algorithm termi¬ 
nates. Thus, Algorithm |4] reports the entire 4-truss Go as the an¬ 
swer, which has diameter 4, compared to the answer of Figure[TIb) 
reported by Algorithm[T] which has diameter 3. □ 

5.2 Local Exploration 

In this subsection, we develop a heuristic strategy to quickly find 
the closest truss community by local exploration. The key idea 
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Algorithm 5 Local-CTC (G, Q) 

Input: A graph G = {V, E), a set of query nodes Q = {qi, ...,qr}, a 
node size threshold rj. 

Output: A connected fc-truss R with a small diameter. 

1: Compute a Steiner Tree T containing Q using truss distance fuctions; 
2: /ct •<—mine^T 

3: Expand T to a graph Gt = {e G G\ r(e) > kt], s.t. T G Gt and 
\Gt\ <rj-, 

4: Extract the maximal connected fc-truss Ht containing Q from Gt, 
where fc < fct is the maximum possible trussness; 

5: Apply BuIkDelete algorithm on Ht to identify closest community. 


is as follows. We first form a Steiner tree to connect all query 
nodes, and then expand it to a graph Gq by involving the local 
neighborhood of the query nodes. From this new graph Gq, we 
find a connnected fc-truss with the highest k containing Q, and 
then iteratively remove the furthest nodes from this fc-truss using 
the BuIkDelete algorithm discussed earlier. 

Connect query nodes with a Steiner tree. As explained above, 
the Steiner tree found is used as a seed for expanding into a k- 
truss. It is well-known that finding a minimal weight Steiner tree 
is NP-hard but it admits a 2-approximation 11811221 . However, a 
naive application of these algorithms may produce a result with a 
small trussness. To see this, consider the graph G and the query 
Q = {qi, q 2 , ® } in Figure[TJa). Suppose all edges are uniformly 
weighted. Then it is obvious that the tree Ti = {( 52 , qi), 

(f,®)} with total weight 3 is an optimal (i.e., minimum weight) 
Steiner tree for Q. However, the smallest trussness of the edges in 
Ti is 2 , which suggests growing Ti into a larger graph will yield 
a low trussness. By contrast, the Steiner tree T 2 {(qi, 52 ), ( 52 ,^ 4 ), 
(P 4 , qa)} has the total weight 3 and all its edges have the trussness 
at least 4, indicating it could be expanded into a more dense graph. 
To help discriminate between such Steiner trees, we define path 
weights as follows. Recall the definition of f{S) from Section|^ 

Definition 7 (Truss Distance). Given a path P between 
nodes u,v in G, we define the truss distance ofu and v ax distp(u, v) 
= distp(M, p)-|- 7 (t(0) — minegp r(e)), where distp(M, v) is the 
path length of P, and 7 > 0 . For a tree T, by distT(u, v) we mean 
distp(u, v) where P is the path connecting u and v in T. 

The difference r(0) — minegp r(e) measures how much the 
minimum edge trussness of path P falls short of the maximum 
edge trussness of the graph G and 7 controls the extent to which 
small edge trussness is penalized. The larger 7 is, the more im¬ 
portant edge trussness is in distance calculations. Note that, for 
a special path P of a single edge (u, v), the minimum edge truss 
in P is t(u, v). On the other hand, for a path P of length more 
than 1, the penalty only depends on the minimum edge trussness of 
path P, but not accounts for every edge in P. In order to leverage 
the well-known approximation algorithm of Steiner tree algorithm 
da, we define the truss distance for a path. Recall the procedure 
of Steiner tree algorithm da , given a graph G and query nodes Q, 
it firstly constructs a complete distance graph G' of query nodes 
where the distance equals to its shortest path length in G, and finds 
a minimum spanning tree T of G', then constructs another graph 
H by replacing each T's tree edge by its corresponding shortest 
path in G, and finally finds a minimum spanning tree of H and 
deleting leaf edges. We apply the truss distance function on the 
path weight for shortest path and minimum spanning tree construc¬ 
tion here. For instance, in the above example, r(0) = 4 and 
for 7 = 3 , the truss distance of ( 52 , 53) in Ti is distT^ ( 72 , Qs) = 


Table 2: Network statistics (K = 10® and M = 10®) 


Network 

WgI 

|SgI 

(^max 


Facebook 

4K 

88K 

1,045 

97 

Amazon 

335K 

926K 

549 

7 

DBLP 

317K 

IM 

342 

114 

Youtube 

I.IM 

3M 

28,754 

19 

LiveJournal 

4M 

35M 

14,815 

352 

Orkut 

3.1M 

117M 

33,313 

78 


distTi (q 2 , 73 ) + 3 • (4 — 2) =34-6 = 8 , since the minimum edge 
trussness of Ti is r(qi, f) = 2 . On the other hand, distTj (qi, 53 ) = 
distTs (qi, 53 ) 4 - 3 • (4 — 4) =34-0 = 3. Obviously, the Steiner 
tree T 2 has a smaller truss distances than Ti. It can be verified that 
its overall weight is smaller than that of Ti. 

Find Go by expanding Steiner tree to graph. After obtaining the 
Steiner tree T for the query nodes, we locally expand the tree to a 
small graph Gt as follows. We firstly obtain the minimum trussness 
of edges in T as kt = minegr T{e). Then, we start from the nodes 
in T, and expand the tree to a graph in a BFS manner via edges of 
trussness no less than kt, and iteratively insert these edges into Gt 
until the node size exceeds a threshold 7 , i.e., |H(Gt)| < 7 , were 
r] is empirically tuned. Since Gt is a local expansion of T, the 
trussness of Gt will be at most kt, i.e., T{Gt) < kt- For ensuring 
the dense cohesive structure of identified communities, we apply a 
truss decompostion algorithm on Gt- Then, we extract the maximal 
connected fc-truss subgraph Ht containing Q by removing all edges 
of trussness less than k from Gt, where k < kt is the maximum 
possible trussness. 

Reduce the diameter of Go. We take the graph Ht with the maxi¬ 
mum trussness k as input, and apply a variant of BuIkDelete algo¬ 
rithm on Ht for returning the identified community. We implement 
a variant of BuIkDelete algorithm, which is different from original 
BuIkDelete w.r.t. the removed vertex set L = {u*|distGi (u*, Q) > 
d — l,u* G Gi}. We readjust the furthest nodes to be removed, 
as L' = {u*|distGi (u*, Q) > d,u* G G;}. This adjustment 
makes the algorithm not as efficent as BuIkDelete in asymptotic 
running time complexity, but we still find it efficient in practice. 
On the other hand, in practice, this strategy can achieve a smaller 
graph diameter than BuIkDelete. This new strategy provides a 2- 
approximation of the optimal. Moreover, in our implementation, 
in each iteration, we carefully remove only a subset of nodes in L', 
which have the largest total of distances from all query nodes. As 
a result, more nodes with the largest query distance are removed 
from the community in the end. The reason is as follows. Sup¬ 
pose the largest query distance we found as d, in the real world, the 
number of nodes having query distance d may be large, due to the 
small-world property. 

6. EXPERIMENTS 

We conduct experimental studies using 6 real-world networks 
available from the Stanford Network Analysis Project, where all 
networks are treated as undirected. The network statistics are shown 
in Table 12 All networks except for Facebook contain 5,000 top- 
quality ground-truth communities. 

To evaluate the efficiency and effectiveness of improved strate¬ 
gies, we test and compare three algorithms proposed in this paper, 
namely, Basic, BD, and LCTC. Here, Basic is the basic greedy 
approach Basic in Algorithm[T] which removes single furthermost 
node at each iteration. BD is the BuIkDelete approach in Algo¬ 
rithm |4] which removes multiple furthermost nodes at each iter¬ 
ation. LCTC is the local exploration approach in Algorithm [2 
For LCTC, we set the parameters q = 1, 000 and 7 = 3 , where 
77 = 1, 000 is selected to achieve stable quality and efficiency by 

■ snap.Stanford.edu 
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(a) Query Time (b) The percentage (c) Density 

Figure 5: DBLP: varying query size \Q\ 


(a) Query Time (b) The percentage (c) Density 

Figure 7: DBLP: varying query vertices 



(a) Query Time (b) The percentage (c) Density 

Figure 6: Facebook: varying query size \Q\ 


(a) Query Time (b) The percentage (c) Density 

Figure 8: Facebook: varying query vertices 


testing 77 in [500, 2, 000], and 7 = 3 is selected to balance the re¬ 
quirements of trussness and diameter for communities searched. 

We randomly generate sets of query nodes to test. Three pa¬ 
rameters, query size |Q|, degree rank Qd, and inter-distance I, are 
used for generating query nodes with varied values. Here, \Q\ is 
the number of query nodes, which is set to 3 by default. Qd is the 
degree rank of query nodes. We sort all vertices in descending or¬ 
der of their degrees in a network. A node is said to be with degree 
rank of X%, if it has top highest X% degree in the network. The 
default value of Qd is 80%, which means that a query node has 
degree higher than the degree of 20 % nodes in the whole network. 
The inter-distance I is the inter-distance between all query nodes. 
The default I = 2 indicates that all query nodes are within distance 
of 2 to each other in the network. 

For the efficiency, we report runtime in seconds. We treat the 
runtime of a query as infinite if its runtime exceeds 1 hour. 

For the effectiveness of eliminating “free riders”, we compare 
our methods with Truss (Algorithmic, which finds the connected 
fc-truss graph containing query nodes with the largest k only. 

Let Gh be the closest truss community found by LCTC and Go 
be computed by Truss. We report two things. One is the percentage 
of nodes that are kept in the resulting community by . The 

less percentage the more “free riders” being removed. The other 
is the edge density 2\E{g)\/\V{g)\{\V{g)\ — 1), where g is either 
Gr or Go. 

In addition, to evaluate the quality of closest truss community 
found, we implemented two state-of-the-art community search meth¬ 
ods: the minimum degree-based community search (MDC) (13, 
which globally finds the dense subgraph containing all query nodes 
with the highest minimum degree under the distance and size con¬ 
straints, and the query biased densest community search (QDC) 
(13, which shifts the detected community to the neighborhood of 
the query by integrating the edge density and nodes proximity to 
the query nodes. Here, MDC and QDC are implemented using the 
same data structures, such as graph, Steiner tree and hashtable as we 
do for LCTC. To compare LCTC with MDC and QDC, we test the 
datasets with ground-truth, and show Fl-score to measure the align¬ 
ment between a discovered community G and a ground-truth com¬ 
munity G. Here, FI is defined as F1(G, G) = 

where prec{G, C) = is the precision and recalliC, C) = 

is the recall. 

1^1 

All algorithms are implemented in and all the experiments 
are conducted on a Linux Server with Intel Xeon CUP X5570 (2.93 
GHz) and 50GB main memory. 


Exp-1 Different Queries: We test our approaches using different 
queries on DBLP and Facebook (Tablet- 

First, we vary the query size j Q j. We test 5 different j Q j in {1, 2, 

4, 8 , 16}. For each [Qj, we randomly select 100 sets of \Q\ query 
nodes, and we report the average runtime, the average percentage of 
avoiding FRE and the average edge density^ The results for DBLP 
and Facebook are shown in Figure [3 and Figure | 6 ] respectively. 
LCTC outperforms the best in terms of efficiency, the percentage 
of avoiding FRE, and edge density in all cases. Basic cannot find 
communities in DBLP in 1 hour limit. BD achieves better effi¬ 
ciency in Facebook than DBLP. This is because Facebook contains 
only 4K vertices and the global method BD is effective on such a 
small network. However, BD performs worse than Basic for the 
percentage of avoiding FRE and density for Facebook. 

Second, we vary the degree of query nodes. For a graph to be 
tested, we sort the vertices in descending order of their degrees, 
and partition them into 5 equal-sized buckets. For each bucket, we 
randomly select 100 different query sets of size 3, and we report 
the average runtime, the average percentage of avoiding FRE and 
the average density. The results for DBLP and Facebook are shown 
in Figure 13 and Figure [S] respectively. In terms of runtime, the 
percentage of avoiding FRE and density, the performance are simi¬ 
lar to the results by varying the query sizes. LCTC outperforms the 
others. 

Third, we vary the inter-distance I within query nodes from 1 to 

5. For each I value, we randomly select 100 sets of 3 query nodes, 
in which the inter-distance of query nodes is to be 1. We report 
the average runtime, the average percentage of avoiding FRE and 
density. The results for DBLP and Facebook are shown in Figure 
13 and Figure [To] respectively. The performance in terms of run¬ 
time, the percentage of avoiding FRE and density are similar to 
the results observed. All methods increase the percentage while 
the inter-distance I increases. This is because the diameter of com¬ 
munity increases, and therefore the less number of nodes can be 
removed from graph. LCTC outperforms the others. 


Table 3: Index size and index construction time 


Network 

Graph Size (M) 

Index Size (M) 

Index Time (s) 

Facebook 

0.9 

1.3 

7.4 

Amazon 

12 

19 

6.7 

DBLP 

13 

20 

14 

Youtube 

37 

59 

76 

LiveJounarl 

478 

666 

2,142 

Orkut 

1,640 

2,190 

21,012 


"^Notice that Basic, BD, LCTC are not optimal algorithms. 
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(a) Query Time (b) The percentage (c) Density 

Figure 9: DBLP: varying inner distance I 




I I I 

(a) Query Time (b) The percentage (c) Density 

Figure 10: Facebook: varying inner distance I 

We report the simple fc-truss index in terms of index size (Megabytes) 
and index construction time (seconds) in Table [3 The size of the 
fc-truss index is 1.6 times of the original graph size, which confirms 
that the simple fe-truss indexing scheme has 0 {m) space complex¬ 
ity and is very compact. The index construction is very efficient. 

Exp-2 A Case Study on DBLP: We construct a collaboration net¬ 
work from the raw DBLP data se0 for a case study. A vertex rep¬ 
resents an author, and an edge between two authors indicates they 
have co-authored no less than 3 times. This DBLP graph contains 
234,879 vertices and 541,814 edges. 

We use the query Q = {“Alon Y. Halevy”, “Michael J. Franklin”, 
“Jeffrey D. Ullman”, “Jennifer Widom”} to test our closest truss 
community model for detecting the community. Fieure fTTT a) shows 
Go that is the maximal connected 9-truss containing Q. This entire 
graph has 73 nodes, 486 edges, edge density of 0.18 and diameter 
of 4. As we can see that most black nodes span long distance to 
reach at each other. They are loosely connected with query nodes 
by some midsts. Our method LCTC removes these balck nodes 
and finds a closest truss community for Q shown in Figure [TTT b). 
which is a 9-truss of diameter 2. It has 14 authors, 81 edges and the 
edge density of 0.89. The community does not indue any authors 
in 9-truss, and thoes other are far away from queried authors. 

Exp-3 The Quality by Ground-Truth: To evaluate the effective¬ 
ness of different community models, we compare LCTC with three 
other methods M DC, QDC and Truss using the 5 networks, DBLP, 
Amazon, Youtube, LiveJournal, and Orkut, with ground-truth com¬ 
munities (Ml- We randomly select query nodes that appear in a 
unique ground-truth community, and select 1,000 sets of such query 
nodes with the size randomly ranging from 1 to 16. We evaluate the 
accuracy by the FI-score of the detected community, and report the 
averaged Fl-score over all query cases. 

Figure [T^ a) shows the Fl-score. Our method achieves the high¬ 
est Fl-score on most networks. QDC has the second best perfor¬ 
mance, which outperforms LCTC on Youtube network. MDC does 
not perform well due to the fixed distance and size constraints. We 
observe that the accuracy drops on Orkut for most methods. One 
possible reason is that many ground-truth communities in Orkut are 
not densely connected, which violates the assumption of all dense 
community models. Another reason is that the community mem¬ 
bership per node on Orkut is much larger than that on other net¬ 
works GH. The large overlap of ground-truth communities makes 
them difficult to be detected accurately. Figure [T^ b) shows that 
LCTC runs much faster than MDC and QDC, and is close to Truss. 
Figure [T^ c) shows the size of communities detected by LCTC and 

^http://dblp.uni-trier.de/xml/ 



(a) Go (b) LCTC 

Figure 11: Community search on DBLP network using query 
Q ={“Alon Y. Halevy”, “Michael J. Franklin”, “Jeffrey D. Ull- 



(a) Fi score (b) Query time (c) Reduction 

Figure 12: Quality evaluation on networks with ground-truth 
communities 


Truss, in terms of the number of vertices and edges. As we can see, 
the number of nodes (|C|-) and the number of edges (|i7|-) in com¬ 
munities detected by LCTC are much less than those by Truss on 
all networks. It confirms the power of eliminating irrelevant nodes 
from discovered communities by our LCTC. 

Exp-4 Diameter and Trussness Approximation: We evaluate the 
diameter approximation of detected communities by our methods 
on Facebook network. Here, we take the lower bound of the opti¬ 
mal diameter (LB-OPT) as the smallest query distance 6 \stu{R, Q), 
where R is the community detected by method Basic. We show the 
curve of 2distij(J?, Q), which serves as the upper bound of small¬ 
est diameter (UB-OPT) by Lemma The averaged diameters of 
communities detected by different methods are reported in Figure 
da), where we vary the inter-distance 1. The diameters of de¬ 
tected communities obtained by all our methods are very close to 
the lower bound of optimal one. Figure [T^ b) shows the maximum 
trussness of detected communities by our methods. Basic and BD 
globally search the fc-truss containing query nodes on the entire 
graph, and the detected communities have the maximum trussness 
k. LCTC can detect the trussness of communities which are very 
close to Basic and BD, by searching over a small graph locally. 
LCTC balances the efficiency and effectiveness well. 

Exp-5 Varying Maximum Trussness fe: In this experiment, we 
evalute our method LCTC that do not find the truss community 
with the real maximum trussness, but with a given maximum value 
k. We test different k ranged from 2 to to “max ”, which is the real 
largest trussness could be. The diameter of found community by 
LCTC is reported in Figure [T4l With k decreases, the lower bound 
of optimal diameter also decreases from 3.6 to 3.0, but the margin 
is small. Meanwhile, the communities detected by LCTC are very 
close to the optimal one for any k. The approximation ratio is not 
greater than 1.2. This indicates that our model with the maximum 
trussness constraint have the adavantage of parameter-free. 

Exp-6 Varylug LCTC parameters: In this experiment, we test the 
performance of LCTC by varying parameters 77 and 7. We used 
the same query nodes that are selected in Exp-3 on DBLP network. 
The similar results can be also observed on other 4 networks in this 
paper. 77 = 1000 and 7 = 3 is the default setting for LCTC. For 
the parameter 77 , we firstly vary it from 100 to 2000. The results of 
Fl-score, the number of community vertices \V\ and the running 
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(a) Diameter (b) Trussness 

Figure 13: Varying the inner distance I on Facebook 



Fignre 14: Diameter v.s. the maximum trussness k on Facebook 


time, are reported in Figure [Ts] As we can see, the number of 
community vertices increases when r) increases from 100 to 500, 
and then keeps stable for larger rj. It shows that the default setting 
rj = 1000 is large enough. Moreover, LCTC achieves the stable 
performance of Fl-score and running time by varying rj. We also 
test the parameter 7, and report the results on FigurefT^ The num¬ 
ber of community vertices increases with the increased 7. Because 
LCTC with a larger 7 can detected the community of a larger truss¬ 
ness, and the number of vertices to be removed is reduced. On the 
other hand, the Fl-score increases with increasing 7 at first, but it 
drop slightly when 7 further increases. The running time of LCTC 
keeps table. 

7. RELATED WORK AND DISCUSSION 

In this section, we firstly discuss the rationale of our designed 
model, and then review the most related work to our study, which 
contains community search, community detection, and dense sub¬ 
graph mining. 

7.1 Design Decisions 

Here, we discuss several natural candidates for community mod¬ 
els and provide a rationale for our definition of closest truss com m u n ity. 

Diameter vs query distance. Being closely related to the query 
nodes is a natural desirable property for nodes to be included in 
a community. In the literature, small diameter has been regularly 
considered as an important hallmark of a good community - see 
e.g., (HIIIlIBlIIll. Thus, minimizing diameter in identifying 
communities has a natural motivation. 

Secondly, by definition, a community with a small diameter will 
also have small query distance from its nodes. On the other hand, 
minimizing query distance ignores the distance between non-query 
nodes in the community. In this sense, small diameter is a strictly 
stronger property than small query distance. Example [^illustrates 
this point and the value of minimizing diameter as opposed to just 
query distance. 

Trading trussness for diameter. Every fc-truss is also a {k — 1)- 
truss by definition. Thus, relaxing the maximum trussness require¬ 
ment may allow us to find a community with a smaller diameter 
by sacrificing trussness. One problem is that the variation of di¬ 
ameter as trussness decreases, may not be smooth but may face a 
sudden drop as trussness decreases to a low value. E.g., continuing 
with the example of Eigure[TJa) with query Q = {di, 52, da}, our 
CTC model yields a community with the highest trussness fc = 4 
and diameter 3, as in Figure [TJb). When k — 3, the 3-truss con¬ 
taining Q with the smallest diameter is still Figure (Hb). How- 
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fa) |y| —LCTC (b) FI-Score (c) Query Time 

Figure 15: DBLP: varying parameter 7 of LCTC 



(a) |y| —LCTC (b) FI-Score (c) Query Time 

Figure 16: DBLP: varying parameter 7 of LCTC 


ever, when fc = 2, the cycle of {(qi,t), {t,q 3 ), ( 53 , 114 ), ( 114 , 52 ), 
( 52 , 51 )} turns out to be the 2-truss containing Q and its diameter 
is 2. However, this is loosely connected and has a low edge density. 
In general, for a small k, the fc-truss community found by removing 
free riders may have loosely connected structure and thus may be 
noisy. One advantage of our approach is that it is parameter-free. 
However, if a user would like to explore trading trussness for di¬ 
ameter, it is straightforward to extend our algorithms (Algorithms 
[T]and[2j to treat the desired trussness A: as a constraint instead of 
maximizing trussness. Finally, another way of combining trussness 
and diameter is using a weighted combination, but this comes with 
the challenge of tuning the weights. Our parameter-free approach 
of minimizing diameter while keeping trussness at the maximum 
value is a reasonable choice. 

Constraining community size. At first, it appears that we can 
minimize or avoid free riders by bounding the size of a community. 
However, sizes of commuities may vary widely and it is difficult, if 
possible at all, to impose proper bounds on acceptable community 
sizes. Moreover, bounding the size of the community may render 
the problem of finding a query driven community inapproximable 
w.r.t. any factor. Specifically, consider the special case of finding 
a fc-truss of size at most a given parameter i that contains Q — 
This subsumes the fc-clique problem, which is not approximable 
within any reasonable factor (H). By contrast, minimizing diame¬ 
ter instead of size admits efficient approximation. Indeed, our for¬ 
mulation does address community size indirectly. The larger the 
k, the smaller the size of the fe-truss. Our CTC model maximizes 
trussness. Furthermore, by minimizing the diameter, it helps re¬ 
move free riders, thus reducing the size in a disciplined manner. 
On the algorithmic side, our LCTC method (Section [5.2t actually 
uses a size threshold to prune the search space and improve effi¬ 
ciency. Thus, LCTC controls the size of a community in a heuristic 
manner. 

7.2 Community Search 

Recently, several community search models have been studied, 
including fc-truss C3, quasi-clique ||9j, fe-core influential 

community US and query biased densest subgraph 1321 . Here, we 
compare these models with our proposed closest truss community 
model w.r.t. three aspects: (i) consideration of query nodes, (ii) 
cohesive structure, and (iii) quality approximation. 

Query nodes. Cui et al. O have recently studied the problem of 
online search of overlapping communities for a query node by de- 
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signing a new a-adjacency 7-quasi-fc-clique model. Huang et al. 
im propose a fc-truss community model based on triangle adja¬ 
cency, to find all overlapping communities of a query node. They 
ignore the diameter of the resulting community. Cui et al. do) find 
a k-core community for a query node using local search. In ad¬ 
dition, influential community model l20l finds top-r communities 
with the highest influence scores over the entire graph; no query 
nodes are considered. Extending any of above models from one (or 
zero) query node to multiple query nodes raises new challenges. 
First, for the models with one query node and a parameter k, the 
search algorithm can easily start from this node to find qualified 
subgraphs. For multiple nodes, it is non-trivial for the search al¬ 
gorithm to determine the start point and search directions, which 
can quickly connect all query nodes. Second, for a given parameter 
k, the connected dense subgraph containing all query nodes may 
not exist. Thus, it requires the search algorithm to automatically 
determine the proper k for different query nodes. 

Cohesive structure. (27l and l32l support community search of 
multiple query nodes similarly to us, thus they are most related to 
our work. Sozio et al. (13 proposed a fc-core based community 
model, called Cocktail Party model, with distance and size con¬ 
straints. Our proposed closest truss community model is based on 
connected fc-truss. Conceptually, fc-truss is a more cohesive defi¬ 
nition than fc-core, as fc-truss is based on triangles whereas fc-core 
simply considers node degree (29). Most recently, Wu et al. (33 
studied the query biased densest connected subgraph (QDC) prob¬ 
lem for avoiding subgraphs irrelevant to query nodes in the commu¬ 
nity found. While QDC is also defined based on a connected 
graph containing Q similarly to CTC, it optimizes a fundamentally 
different function called query biased edge density, which is calcu¬ 
lated as the overall edge weight averaged over the weight of nodes 
in a community. 

Quality approximation. Both problems proposed in (23 and 1321 
are NP-hard to compute, and do not admit approximations with¬ 
out further assumptions. (33 gives an approximation solution of 
QDC by relaxing the problem. Unfortunately, as the authors show 
themselves ( 13 , this could fail in real applications, for two rea¬ 
sons. First, the algorithm may find a solution consisting of several 
connected components with query nodes split between them. Sec¬ 
ond, the approximation factor can be large, which can deteriorate 
further with a larger number of query nodes. In contrast, we pro¬ 
vide an efficient 2 -approximation algorithm for finding the closest 
truss community containing any set of query nodes. We provide a 
heuristic algorithm based on local exploration which significantly 
improves the efficiency and show that on several real networks, it 
delivers a high-quality solution. 

7.3 Community Detection 

The goal of community detection is to identify all communities 
in the entire network. A typical method for finding communities 
is to optimize the modularity measure (23. Generally, community 
detection falls into two major categories: non-overlapping I24II26I 
1381 and overlapping community detection ( 25 l[Il[ 35 ][ 33 . All these 
methods consider static communities, where the networks are parti¬ 
tioned a priori. Query nodes are not considered since their focus is 
not community search. QD surveys several community detection 
methods and evaluates their performance using rigorous tests. I34l 
proposes an online distributed algorithm for community detection 
in dynamic networks using label propagation. As such, these works 
on community detection are significantly different from our goal of 
query driven community search. 

7.4 Dense Subgrapli Mining 


There is a very large body of work on mining dense subgraph 
patterns, including clique (3 ID [301 [ 33 , quasi-clique (281 . fc-core 
(3111, fc-truss (3 1^1^ . dense neighborhood graph 1311 . to name 
a few. 

Clique and quasi-clique enumeration methods include the clas¬ 
sical algorithm (3, the external-memory iT*-graph algorithm (5l . 
redundancy-aware clique enumeration (^ . maximum clique com¬ 
putation using MapReduce ( 33 , and optimal quasi-clique mining 
(^ . Various studies have been done on core decomposition and 
truss decomposition in different settings, including in-memory al¬ 
gorithms (3131391 , external-memory algorithms (mill, and MapRe¬ 
duce di. 11711391 designed an incremental algorithm for updating 
a fc-truss with edge insertions/deletions. Wang et al. (33 studied 
a dense neighborhood graph based on common neighbors. None 
of these works considers query nodes, which as we have discussed 
earlier, raise major computational challenges. 

8. CONCLUSION 

In this paper, we study the closest truss community search prob¬ 
lem over a graph, given a set of query nodes, that is, find a densely 
connected community, in which nodes are close to each other. Based 
on the dense subgraph definition of a fc-truss, we formualte the 
CTC as a connected fc-truss subgraph containing the query nodes 
with the largest k, and has the minimum diameter among such sub¬ 
graphs. We showed the problem is NP-hard and is NP-hard to ap¬ 
proximate within a factor better than 2. We also matched this lower 
bound by developing a greedy algorithmic framework that provides 
a 2-approximation to the optimal solution. To support the efficient 
search of a CTC, we make use of a truss index and develop effi¬ 
cient methods of truss idenfication and maintenance. Futhermore, 
we improve the efficiency of greedy framework further using the 
bulk deletion optimization and local exploration strategies. Exten¬ 
sive experimental results on large real-world networks with ground- 
truth communities demonstrate the effectivenss and efficiency of 
our proposed community search model and solutions. 

It would be interesting to extend our search model and algo¬ 
rithms to directed graphs. Given the recent surge of interest in 
probabilisic graphs, an exciting question is how fc-truss generalizes 
to probabilistic graphs. The challenge is to develop extensions that 
are widely useful and tractable. Last but not the least, it would be 
interesting to extend the notions and techniques to networks with 
interactions between nodes. 
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